Multi-Core Ant Colony Optimization for TSP in Haskell

This is a continuation of Parallel Ant Colony Optimization for TSP in Standard ML. See earlier page for Ant Colony and TSP problem description. Here the program has been rewritten in the programming language Haskell. See Multi-Core Ant Colony Optimization for TSP in Erlang for more performance and language comparisons.

The Haskell language is potentially interesting for parallel programming. Programs written in Haskell are closer to high-level declarative descriptions of the problem than those written in traditional imperative languages. With another high-level declarative language, SQL, it has been possible to automatically extract parallelism without the programmer explicitly specifying creation and assignment of threads.

The Glasgow Haskell Compiler has been extended to support parallel programming. While these extensions were originally designed for PVM clusters, in the GHC 6.5 development release they are supported for SMP and multi-core machines (see Multiprocessor GHC). These extensions provide new functions for controlling parallel execution. The par function starts a new thread to evaluate an expression, while the seq function forces sequential execution. The runtime system uses these functions as suggestions for how many threads to use. The maximum number of threads is controlled by the -N# command line parameter. I have used par and seq in my Ant Colony implementation.

Other more abstract mechanisms for exploiting parallelism in Haskell have also been developed. Strategies, introduced in the paper Algorithm + Strategy = Parallelism allow separation of algorithms from parallel behavior. The Control.Parallel.Strategies GHC module provides several example strategies and modified basic functions which utilize them. One is parMap, which like normal map applies a function to the elements of a list, but now performs the function applications in parallel. Using parMap I obtained identical timings and results as when I used the explicit par and seq instructions, above.

My Haskell source code is here. For comparison I also tested the Standard ML version using the MLton compiler; my MLton SML source is here. MLton does not support multiple cores.

languagecitiesiterations (ants)coresheap memorytime (secs)% CPU time GC% elapse time GCspeedup
Haskell2240001default6.63939
Haskell2240002default5.342581.25
Haskell22400011500M5.787
Haskell22400021500M2.9231.97
Standard ML2240001default or 200K0.3200

All tests were performed on AMD Athlon A64x2 (dual core) 4400 CPU (2.2 GHz), 2048Meg memory, Debian Linux 2.6.15-1-k7-smp, GHC 6.5.20060620 (compile: -O, runtime: -H1500m), MLton 20060213-1.

The speedup for dual-core Haskell was strongly influenced by the amount of available memory. SMP GHC garbage collection is single-threaded (see paper Haskell on a shared-memory multiprocessor) , so the percentage of execution time spent in garbage collection restricts the speedup. When enough memory is made available almost perfect speedup (1.97) is achieved. On some test runs garbage collection did kick in earlier and a speedup of only 1.5 was obtained. Haskell consumes much more memory than Standard ML because Haskell does not allow variables to be destructively updated. Instead new copies are created.

Here are CPU and memory utilization traces for the four Haskell test runs in order: